NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Precise unbiased estimation in randomized experiments using auxiliary observational data

https://doi.org/10.1515/jci-2022-0011

Gagnon-Bartsch, Johann A.; Sales, Adam C.; Wu, Edward; Botelho, Anthony F.; Erickson, John A.; Miratrix, Luke W.; Heffernan, Neil T. (January 2023, Journal of Causal Inference)

Abstract Randomized controlled trials (RCTs) admit unconfounded design-based inference – randomization largely justifies the assumptions underlying statistical effect estimates – but often have limited sample sizes. However, researchers may have access to big observational data on covariates and outcomes from RCT nonparticipants. For example, data from A/B tests conducted within an educational technology platform exist alongside historical observational data drawn from student logs. We outline a design-based approach to using such observational data for variance reduction in RCTs. First, we use the observational data to train a machine learning algorithm predicting potential outcomes using covariates and then use that algorithm to generate predictions for RCT participants. Then, we use those predictions, perhaps alongside other covariates, to adjust causal effect estimates with a flexible, design-based covariate-adjustment routine. In this way, there is no danger of biases from the observational data leaking into the experimental estimates, which are guaranteed to be exactly unbiased regardless of whether the machine learning models are “correct” in any sense or whether the observational samples closely resemble RCT samples. We demonstrate the method in analyzing 33 randomized A/B tests and show that it decreases standard errors relative to other estimators, sometimes substantially.
more » « less
Full Text Available
Design-Based Covariate Adjustments in Paired Experiments

https://doi.org/10.3102/1076998620941469

Wu, Edward; Gagnon-Bartsch, Johann_A (July 2020, Journal of Educational and Behavioral Statistics)

In paired experiments, participants are grouped into pairs with similar characteristics, and one observation from each pair is randomly assigned to treatment. The resulting treatment and control groups should be well-balanced; however, there may still be small chance imbalances. Building on work for completely randomized experiments, we propose a design-based method to adjust for covariate imbalances in paired experiments. We leave out each pair and impute its potential outcomes using any prediction algorithm such as lasso or random forests. This method addresses a unique trade-off that exists for paired experiments. By addressing this trade-off, the method has the potential to improve precision over existing methods.
more » « less
The LOOP Estimator: Adjusting for Covariates in Randomized Experiments

https://doi.org/10.1177/0193841X18808003

Wu, Edward; Gagnon-Bartsch, Johann_A (November 2018, Evaluation Review)

Background:When conducting a randomized controlled trial, it is common to specify in advance the statistical analyses that will be used to analyze the data. Typically, these analyses will involve adjusting for small imbalances in baseline covariates. However, this poses a dilemma, as adjusting for too many covariates can hurt precision more than it helps, and it is often unclear which covariates are predictive of outcome prior to conducting the experiment. Objectives:This article aims to produce a covariate adjustment method that allows for automatic variable selection, so that practitioners need not commit to any specific set of covariates prior to seeing the data. Results:In this article, we propose the “leave-one-out potential outcomes” estimator. We leave out each observation and then impute that observation’s treatment and control potential outcomes using a prediction algorithm such as a random forest. In addition to allowing for automatic variable selection, this estimator is unbiased under the Neyman–Rubin model, generally performs at least as well as the unadjusted estimator, and the experimental randomization largely justifies the statistical assumptions made.
more » « less

Search for: All records